fix: parse date values #150

johnmdonich · 2019-12-26T04:28:00Z

Add support for datetime.date values in pandas DataFrame as described in #70

Pandas types the SQL DATETYPE as an Object column with standard library datetime.date values. This also comes up when converting an Apache Spark Dataframe with DateType() columns.

A simple check for date values in the object apply() function is all that is needed.

domoritz · 2020-01-07T23:44:55Z

@jakevdp Should we apply this fix in Altair in https://github.com/altair-viz/altair/blob/bd593867f4413936ab723b7583a508aa8c13e06d/altair/utils/core.py?

jakevdp · 2020-01-08T01:23:36Z

Yeah, this could be added to Altair, but storing Python date types in object arrays is really an anti-pattern. Far better and more performant is to use pandas datetimes directly; e.g.

df.Year = pd.to_datetime(df.Year, format='%Y')

jakevdp · 2020-01-08T02:57:04Z

vega/utils.py

        if isinstance(val, np.ndarray):
            return val.tolist()
+        elif isinstance(val, datetime.date):
+            return "{dt:%Y-%m-%d}".format(dt=val)


Date parsing has to be done very carefully at the interface between Pandas and Vega-Lite: these dates should be encoded in ISO format, or else you will run into bugs like this one: vega/altair#1027

I completely agree that having numpy types is strongly preferred over python date types. Although I don't think it is unreasonable to assume that a column of python date be serialized into valid json for plotting.

Adding this support will benefit anyone converting Apache Spark DataFrame to Pandas DataFrame because DateType columns are cast to python dates. You can see in SPARK-23290 the reasoning for this design choice.

I did add an explicit cast to iso format

jakevdp · 2020-01-09T04:25:34Z

vega/utils.py

@@ -48,7 +48,7 @@ def parse_object_column_type(val):
        if isinstance(val, np.ndarray):
            return val.tolist()
        elif isinstance(val, datetime.date):
-            return "{dt:%Y-%m-%d}".format(dt=val)
+            return val.isoformat()


Unfortunately, a partial iso format like "2019-01-01" is not sufficient to ensure that Javascript date parsing is consistent with the expectations of pandas interfacing to Vega-Lite. You need the full iso format, like '2019-01-01T00:00:00'.

Thank you for clearing this up for me i have done more than a couple hacks dealing with this.

domoritz · 2020-01-09T06:17:19Z

Just to check, do we need to add time zone information to the generated string?

johnmdonich · 2020-01-09T15:25:20Z

The method used in Altair and highlighted in the vega/altair#1027 does not add a timezone to the iso format string when converting numpy type M8[D] or the full datetime. In fact timezone aware datetimes are deprecated.

I did some exploration of this and my dates are plotting as expected so I do not think it is necessary. Although @jakevdp has done more work on this than I have so I will defer to him on this one.

domoritz · 2020-01-21T19:00:02Z

Can you help finish this pull request @johnmdonich?

johnmdonich · 2020-01-27T18:42:42Z

For sure.. It is finished except for a squash. I was just waiting for approval/review from @jakevdp.

domoritz · 2020-01-27T23:05:25Z

I can squash when I merge.

domoritz · 2020-05-15T23:55:44Z

@jakevdp @johnmdonich What's left in this pull request? How can I help push it over the finish line?

domoritz · 2021-06-03T16:46:43Z

@johnmdonich could you redo this pull request?

domoritz · 2022-01-19T00:15:51Z

Please send a new pull request.

johnmdonich added 2 commits December 25, 2019 23:19

fix: parse date values

f1bff17

fix: removed f-strings

dbf5f7c

jakevdp reviewed Jan 8, 2020

View reviewed changes

fix: cast to iso date

5c96cec

jakevdp reviewed Jan 9, 2020

View reviewed changes

johnmdonich added 2 commits January 9, 2020 00:13

fix: cast date to full iso datetime

7dd5dbe

fix: removed f-strings

d6ce975

fix: casted iso times to zero

5bd954f

domoritz closed this Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse date values #150

fix: parse date values #150

johnmdonich commented Dec 26, 2019

domoritz commented Jan 7, 2020

jakevdp commented Jan 8, 2020

jakevdp Jan 8, 2020 •

edited

Loading

johnmdonich Jan 9, 2020

jakevdp Jan 9, 2020

johnmdonich Jan 9, 2020

domoritz commented Jan 9, 2020

johnmdonich commented Jan 9, 2020 •

edited

Loading

domoritz commented Jan 21, 2020

johnmdonich commented Jan 27, 2020 •

edited

Loading

domoritz commented Jan 27, 2020

domoritz commented May 15, 2020 •

edited

Loading

domoritz commented Jun 3, 2021

domoritz commented Jan 19, 2022

fix: parse date values #150

fix: parse date values #150

Conversation

johnmdonich commented Dec 26, 2019

domoritz commented Jan 7, 2020

jakevdp commented Jan 8, 2020

jakevdp Jan 8, 2020 • edited Loading

Choose a reason for hiding this comment

johnmdonich Jan 9, 2020

Choose a reason for hiding this comment

jakevdp Jan 9, 2020

Choose a reason for hiding this comment

johnmdonich Jan 9, 2020

Choose a reason for hiding this comment

domoritz commented Jan 9, 2020

johnmdonich commented Jan 9, 2020 • edited Loading

domoritz commented Jan 21, 2020

johnmdonich commented Jan 27, 2020 • edited Loading

domoritz commented Jan 27, 2020

domoritz commented May 15, 2020 • edited Loading

domoritz commented Jun 3, 2021

domoritz commented Jan 19, 2022

jakevdp Jan 8, 2020 •

edited

Loading

johnmdonich commented Jan 9, 2020 •

edited

Loading

johnmdonich commented Jan 27, 2020 •

edited

Loading

domoritz commented May 15, 2020 •

edited

Loading